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Abstract. Non-gaussian spatial data are very common in many disciplines. 
For instance, count data are common in disease mapping, and binary data 
are common in ecology. When fitting spatial regressions for such data, one 
needs to account for dependence to ensure reliable inference for the regression 
coefficients. The spatial generalized linear mixed model (SGLMM) offers a very 
popular and flexible approach to modeling such data, but the SGLMM suffers 
from three major shortcomings: (1) unintcrprctability of parameters due to 
spatial confounding, (2) variance inflation due to spatial confounding, and (3) 
high-dimensional spatial random effects that make fully Bayesian inference for 
such models computationally challenging. We propose a new parameterization 
of the SGLMM that alleviates spatial confounding and speeds computation by 
greatly reducing the dimension of the spatial random effects. We illustrate the 
application of our approach to simulated binary, count, and Gaussian spatial 
datasets, and to a large infant mortality dataset. 



Dimension reduction; Generalized linear model; Mixed model; Regression; Spa- 
tial statistics 



1. Introduction 



Nearly 20 years ago Besag, York, and Mollie (1991) proposed the spatial gener- 



alized linear mixed model (SGLMM) for areal data, a hierarchical model that intro- 
duces spatial dependence through a latent Gaussian Markov random field (GMRF). 
Although |Besag et al.| focused only on prediction for count data, the SGLMM for 
areal data has since been applied to other types of data (e.g., binary), in many 
fields (e.g., ecology, geology, forestry), and for regression as well as prediction. In 
fact, the SGLMM has long been the dominant areal model, owing to its flexible 
hierarchical specification; to the availability of the WinBUGS software application 



(Lunn, Thomas, Best, and Spiegelhalter 2000), which greatly eases data analy- 



sis; and to the various theoretical and computational difficulties that plague the 



areal SGLMM's nearest competitor, the automodel (see, e.g., Hughes, Haran, and 



Caragea 2010 Kaiser and Cressie 1997 2000). 



SGLMMs, while remarkably flexible and widely applicable, suffer from three ma- 
jor shortcomings: (1) uninterpretability of parameters due to spatial confounding, 
(2) variance inflation due to spatial confounding, and (3) computational challenges 
posed by high-dimensional latent variables. The issue of spatial confounding be- 
tween the fixed and random effects was pointed out recently by [Reich, Hodge s, and 
Zadnik (2006), and they suggested a new model (henceforth RHZ) that seeks to 



alleviate confounding by explicitly introducing synthetic predictors that are orthog- 
onal to the fixed effects predictors. But the RHZ model, by failing to account for 
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the underlying graph, permits structure in the random effects that corresponds to 
negative spatial dependence, i.e., repulsion, which we do not expect to observe in 
the phenomena to which these models are typically applied. 

We propose a new model that mitigates confounding while allowing only positive 
spatial dependence among the random effects. We achieve this by exploiting what 
we believe to be the "natural" geometry for these models. By utilizing this geometry 
we are also able to dramatically reduce the dimension of the random effects, which 
speeds computation to the extent that our model makes feasible the analyses of 
large areal datasets. 



Several recent papers have focused on dimension reduction for point-level, i.e., 



Gaussian process-based, spatial models (cf., e.g., Higdon 2002 


Cressie and Jo- 


hannesson[ |2008 ; Bancrjee, Gelfand, Finley, and Sang 


2008 Furrer, Genton, and 


Nychka| |2006| |Rue and Tjelmeland||2002). To our knowledge, this paper is the first 



to propose a principled approach to dimension reduction for areal models (while 
also improving regression inference). 



The rest of the paper is organized as follows. In Section [2] we review the tradi- 
tional SGLMM. In Section [3] we discuss spatial confounding and review the model 
proposed by | Reich et al.| In Section [4] we present our sparse reparameterization of 
the areal SGLMM. In Section [5] we discuss dimension reduction for spatial models. 
In Section[6]we discuss the results of our simulation study, which applied each of the 
three models to simulated binary, count, and Gaussian spatial data. In Section [7] 
we apply our sparse model to a large US infant mortality dataset. We conclude 
with a discussion. 



2. The Traditional SGLMM 



|Besag et al. ( 1991 ) formulated their SGLMM for areal data as follows. Let 
G = (V, E) be the underlying undirected graph, where V — {1,2, ... ,n} are the 
vertices (each of which represents an area of interest) and E are the edges (each of 
which is a pair, that represents the proximity of areas i and j). In the sequel 

we shall represent G using its adjacency matrix, A, which is the n x n matrix with 
entries given by diag(A) = and Ay = G E, i ^ j}, where 1{-} denotes 

the indicator function. 

Now, let Z = (Zi, . . . , Z n )' be the random field of interest, where Zi is the 
random variable associated with vertex i. Then the first stage of the model is given 

by 



(1) 



g(E(Z i \0,W i ))=X i p + Wi, 



where g is a link function, Xj is the ith row of the design matrix, /3 is a p- vector of 
regression parameters, and Wi is a spatial random effect associated with vertex i. 
As is the case for classical generalized linear models, different types of data imply 
different (canonical) choices of the link function, g. For example, the canonical 
link function for binary spatial data is the logit function, logit(p) = log(p/(l —p)), 
for count data it is the natural logarithm function, and for normal data it is the 
identity function (Nelder and Wedderburn 1972). In the latter case we have the 



spatial linear mixed model (SLMM), which is of considerable interest (see, e.g., 



Cressie 1993 Banerjee, Carlin, and Gelfand 2004) 



The field of random effects, W = {W\, . . . , W n )' , whereby the traditional model 
incorporates spatial dependence, is assumed to follow the so-called conditionally 
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autoregressive (CAR) or Gaussian Markov random field prior 
(2) P (W\t)(x 



T ra„k(Q)/2 exp 



-W'QW 



where r is a smoothing parameter and Q = diag(Al) — A is a precision matrix 
(1 is the conformable vector of Is). Note that Q intuitively incorporates both 
dependencies (Wj and Wj, i ^ j, are independent given their neighbors iff Qy = 
Qji = iff E) and prior uncertainty (our uncertainty about Wi is inversely 

proportional to the degree of vertex i: Qu = A^l). 

As described in the seminal paper by Diggle, Tawn, and Moyeed (1998), an 
SGLMM for spatial data over a continuous domain, i.e., for point-level data, can 
be formulated by replacing the GMRF specification with that of a Gaussian process 



(also see De Oliveira 2000 Christensen and Waagepetersen 2002). See Banerjee 



et al.| (2004), Rue and Held| ( |2005[ ) , or |Haran| ( |2010[ ) for an introduction to the use 



of Gaussian random field models in spatial statistics. 

Since |2]) is improper (Q is singular) the traditional areal model restricts us 
to a Bayesian analysis, although it is possible to use a maximum likelihood-based 
approach if a proper GMRF distribution is used instead of this so-called intrinsic 
GMRF (see Besag and Kooperberg 19951. The models described below in Sec- 



tions [3] and [4jhave CARs with invertible precision matrices, and so those models 
lend themselves to both classical and Bayesian analyses. In the interest of concision 
we use only Bayesian terminology in the sequel. 



3. Spatial Confounding 

We begin with a discussion of confounding for the areal SGLMM's closest com- 
petitor, the automodel, because we believe this offers insight regarding confounding 
for the SGLMM. The automodel, a Markov random field model that incorporates 
dependence directly rather than hierarchically, can be specified as 

(3) g(E(Z l \(3, V ,Z_ i j)=X l (3 + V ]T Z* , 

(iJ)eE 

where Z_i denotes the field with the ith observation excluded and 77 is the depen- 
dence parameter (77 > implies an attractive model, 77 < a repulsive model) . For 
the so-called uncentered automodel, Z* = Zj , while Z* = Zj — fj,j for the centered 
automodel, where y,j is the independence expectation of Zj, i.e., fj,j = E(Zj \ (3, r\ = 
0)=. 9 - 1 (X J /3). 

The sum in Q is called the autocovariate. We see that it is a synthetic predictor 
that employs only the observations themselves or the observations along with the 
posited regression component, X/3, of the model. The centered form of the autoco- 
variate leads easily to an interpretation of the dependence parameter: 77 measures 
the "reactivity" of an observation to its neighbors, conditional on the hypothesized 
regression component. This interpretation of 77 lends (3 their desired interpretation 
as regression parameters. Put another way, the purpose of the centered autocovari- 
ate is to fit small-scale structure in the data, by which we mean structure residual to 
the large-scale structure represented by X/3. Evidently the centered autocovariate 
is well suited to this role, since 77 tends to be at most weakly correlated with /3. 

The uncentered automodel, on the other hand, exhibits both conceptual and 
spatial confounding. It is not clear how one should interpret the uncentered auto- 
covariate, and so 77 and /3 are also difficult to interpret. Moreover, 77 and (3 tend to 
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be strongly correlated for the uncentered model. (See Caragea and Kaiser (2009) 



for a treatment of confounding in the context of the autologistic model.) 



Reich et al. ( 2006 ) employed a reparameterization to show that the traditional 
SGLMM, like the uncentered automodel, is confounded. More specifically, they 
showed that introduction of the random effects tends to bias the posterior distri- 
bution of (3 and also inflate its variance. This is because the traditional model 
implicitly contains predictors that arc collincar with X, and it is this collincarity 
that causes the bias and variance inflation. To see this, let P be the orthogonal 
projection onto span(X), i.e., P = X(X'X) _1 X', and let P -1 be the projection 
onto span(X)'s orthogonal complement: P^ = I P. Now, spectrally decompose 
these operators to acquire orthogonal bases, K nxp and L nx („_ p ), for span(X) and 
span(X)- 1 -, respectively. These bases allow us to rewrite |lj as 

5 (E(Z, | /3, Wi)) = Xi/3 + Wi = Xi/3 + K i7 + M, 

where 7 and S are random coefficients. This form exposes the source of the spatial 
confounding: K is collinear with X. 



Since the offending predictors, K, have no scientific meaning, Reich et al. sug- 
gest that they be deleted from the model. Setting 7 = leads to the following 
specification. For the first stage we have 

g(E{Z i \p,S))=X i p + L i 8. 

And the prior for the random effects, d, is now 

p(<5|T)ocr("-f)/ 2 exp(-^'Q^) , 

where Qr = L'QL. 

|Reich et al.| refer to this as smoothing orthogonal to the fixed effects, and they 
showed that their model does, in principle, address the aforementioned problems. 
More specifically, adding this restricted CAR to the nonspatial model adjusts, but 
does not inflate, the variance of /3's posterior but leaves the mean unchanged. We 
also note that RHZ reduces slightly the number of model parameters, from n+p+1 
to n + 1. 

The traditional SGLMM and RHZ take essentially the same approach as the 
automodel in the sense that all of these models augment the linear predictor with 
synthetic predictors. The traditional model can be considered to implicitly employ 
the synthetic predictors K and L, and RHZ explicitly employs L. The traditional 
SGLMM is analogous to the uncentered automodel in that both models introduce 
predictors — K and the uncentered autocovariate, respectively — that cause spatial 
confounding and thus render /3 uninterpretable. And RHZ is analogous to the 
centered automodel in that both models introduce predictors designed to fit only 
residual structure in the data. 



4. A Sparse Reparameterization of the Areal SGLMM 

Although the synthetic predictors for RHZ are readily interpreted, receiving their 
interpretation from the theory of linear models, they are not particularly well suited 
to their task of fitting the residual clustering that arises due to spatial dependence. 
This is because the geometry corresponding to the operator P 1 - neglects the un- 
derlying graph, G. In this section we reparameterize the areal SGLMM using an 
alternative operator that we believe corresponds to the intrinsic geometry of these 
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models. The random effects for our new model (1) permit patterns correspond- 
ing to positive spatial dependence only, i.e., repulsion is disallowed, and (2) have 
dimension much smaller than n. 



In an attempt to reveal the structure of missing spatial covariates, Griffith ( 2003 ) 
augments a generalized linear model with selected eigenvectors of (I— 11 /n)A(I — 
11 /n), where I is the n x n identity matrix and 1 is the n- vector of Is. This 
operator appears in the numerator of Moran's I statistic, 

n Z f (I- 11' /n)A(I- 11' /n)Z 

1 j i'ai z'(i-u'/ n )z 



a popular nonparametric measure of spatial dependence (Moran 1950). We see 
that 1(A) is a scaled ratio of quadratic forms in e = Z — Zl. The numerator of 
the ratio is the squared length of e in the elliptical space corresponding to A, while 
the denominator is the squared length of e in a spherical space. 

Since we are interested not in missing covariates but in smoothing orthogonal to 
X (which may or may not contain 1), we replace I — ll'/n with P^. The resulting 
operator, P i AP i , which we call the Moran operator for X with respect to G, 
appears in the numerator of a generalized form of Moran's /: 

n Z'V^AV^Z 

x( } ~ Yai z'p^z • 



Boots and Tiefelsdorf (2000) showed that (1) the (standardized) spectrum of a 
Moran operator comprises the possible values for the corresponding ix(A), and 
(2) the eigenvectors comprise all possible mutually distinct patterns of clustering 
residual to X and accounting for G. The positive (negative) eigenvalues correspond 
to varying degrees of positive (negative) spatial dependence, and the eigenvectors 
associated with a given eigenvalue (A^, say) are the patterns of spatial clustering 
that data exhibit when the dependence among them is of degree A^. 

To illustrate the appropriateness of the Moran basis for fitting residual spatial 
clustering, we consider the 30 x 30 square lattice and X = [x y], where the ith row 
of the design matrix contains the x and y coordinates of the ith lattice point, with 
coordinates restricted to the unit square. The panels of Figure [T] show eigenvectors 
7, 13, and 42 (which we chose at random) of the RHZ basis (L7, L13, L42) and of 
the Moran basis (M 7 , M 13 , M 42 ). Each panel displays its eigenvector as a "map" 
by associating the ith component of the vector with the spatial location, (xi, j/j), of 
the ith lattice point. (We note that M7, M13, and M42 are associated with eigen- 
values 0.995, 0.970, and 0.868, respectively, which implies that these eigenvectors 
correspond to strong positive spatial dependence.) Each vector of the RHZ basis 
is relatively flat, with the flatness occasionally broken by a very localized spike or 
dip. The vectors of the Moran basis, on the other hand, show clear and regular 
patterns of spatial variation at various scales. 

We henceforth assume that the matrix M contains q <C n eigenvectors of the 
Moran operator. Since eigenvectors that correspond to inconsequential positive 
dependence, independence, or negative dependence (repulsion) are of no interest, 
we can discard at least half of the eigenvectors in many cases, and our simulation 
study will show that a much greater reduction is often possible. 

Replacing L with M in the RHZ model gives, for the first stage, 

g(E(Zi\p,5 s )) =XiP + MiS s . 




Figure 1. Eigenvectors 7, 13, and 42 from the RHZ and Moran 
bases. The Moran basis appears to be an appropriate basis for 
fitting small-scale structure. 
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Table 1. Comparing and contrasting various spatial generalized 
linear models. 



Model 


Systematic Distortion of Mean 


Confounded 


Accounts for X 


Accounts for G 


Uncentered Automodel 




Yes 


No 


Yes 


Traditional SGLMM 


K l7 + L ( <5 


Yes 


No 


No 


RHZ SGLMM 


us 


No 


Yes 


No 


Centered Automodel 


r)T,(i,j)eE( z j ~ Pi) 


No 


Yes 


Yes 


Sparse SGLMM 




No 


Yes 


Yes 



And the prior for the random effects is now 

p(d s | t) oc r q/2 exp (-^sQs^s 

where Qs = M'QM. This implies p + q + 1 model parameters. 

This sparse model is more closely analogous to the centered automodel than is the 
RHZ model. The uncentered automodel accounts for the underlying graph, since 
the uncentered autocovariate is a sum over neighbors, but does not fit structure 
residual to X. The RHZ model accounts for X but not for the underlying graph. 
Both the centered automodel and our sparse SGLMM account for X and for the 
underlying graph. Table [l] compares and contrasts the five models treated here. 

We note that another popular measure of spatial dependence is Geary's C, which 
uses the graph Laplacian, Q, in place of A (Geary 1954). Geary's C is the spa- 



tial analog of the well-known Durbin- Watson statistic for measuring autocorrela- 
tion in the residuals from a time series regression (Durbin and Watson 1950). 



More specifically, the Durbin- Watson statistic is similar to Geary's C for a path 
graph (where adjacency is in time rather than space). Moran's I corresponds to a 
product-moment formulation, Geary's C to a squared-difference formulation. The 
eigensystem of P-'-QP 1 - is a viable alternative to that of P 1 AP 1 for the cur- 
rent application, and perhaps other matrix representations of G would also provide 
suitable eigensystems. We chose A because the spectrum of a Moran operator is 
particularly easy to interpret. 

5. Dimension Reduction for Spatial Models 

Fitting a Gaussian process model like that mentioned above in Section [2] requires 
the repeated evaluation of expressions involving the inverse of the covariance ma- 
trix, H(4>), where <f> are the parameters of the spatial covariance function. The 
customary approach to this problem is to avoid inversion in favor of Cholesky de- 
composition of H followed by a linear solve. Since H is typically dense, its Cholesky 
decomposition is 0(n 3 ), and so the time complexity of the overall fitting algorithm 
is <d(n 3 ). This considerable computational expense makes the analyses of large 
point-level datasets time consuming or infeasible. Consequently, efforts to reduce 
the computational burden have resulted in an extensive literature detailing many 



approaches, e.g., process convolution (Higdon 2002 ), fixed rank kriging ( Cressie and 



Johannesson 2008), Gaussian predictive process models (Banerjee et al. 20081, co- 



random field ( Rue and Tjelmeland 2002 1 



variance tapering (Furrer et al. 2006), and approximation by a Gaussian Markov 



Fitting an areal mixed model can also require expensive matrix operations. It 
is well known that a univariate Metropolis-Hastings algorithm for sampling from 
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Figure 2. The large-scale structure, X/3, for our simulation study. 



the posterior distribution of W leads to a slow mixing Markov chain because the 
components of W exhibit strong a posteriori dependence. This has led to a number 
of approaches that involve updating the random effects in a block(s). Constructing 
proposals for these updates is challenging, and the improved mixing comes at the 
cost of increased running time per iteration (see, for instance , |Knorr-Held and Rue 



2002 Haran, Hodges, and Carlin 2003 Haran and Tierney 2010). 



The random effects for the RHZ model and for our sparse model, on the other 
hand, are practically a posteriori uncorrelated. This means we can use a spherical 
normal proposal for the random effects, which is very efficient computationally. 
Thus the computational crux of fitting the RHZ (sparse) model is the evaluation of 
the quadratic form S'QrS (S' s Qsfis)- This operation has time complexity O(n) for 
the RHZ model, which is sufficient to discourage or prevent the application of the 
model to large datasets. As we will show in the next section, evaluation of S' s Qs^s 
can be 6(1), which renders our sparse model applicable to even very large datasets. 



6. Simulation Study 

We applied the classical GLM and the three spatial models to binary, count, and 
normal data simulated from the sparse model. (We also conducted an extensive 
study wherein we applied these models to binary, count, and normal data simulated 
from the traditional SGLMM (with a proper GMRF). That study yielded similar 
results, which we have omitted in the interest of brevity.) The underlying graph for 
the binary and count data was the 30 x 30 lattice, and we restricted the coordinates 
of the vertices to the unit square. We chose for our design matrix X = [x y], 
where x — (xi, . . . , xqoo)' and y = (yi, . . . , 3/900)' ar e the x and y coordinates 
of the vertices, and we let (5 — (1,1)'. A level plot of this large-scale structure, 
X/3 = {xi+y\,..., £900 + 2/900)', is shown in Figure [2] 

The plot in Figure [3] shows the standardized eigenvalues for the 30 x 30 lattice. 
Since over half of the values are non-positive, we chose to use only the first 400 
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Figure 3. The standardized eigenvalues for the 30 x 30 lattice. 



eigenvectors to simulate data for our study, i.e., dim^s) = 400 and M is 900 x 400. 
The horizontal gray line marks the 400th eigenvalue, which equals 0.05. 

Since the traditional model requires a Bayesian analysis, we applied all three 
spatial models in a Bayesian setting. It is customary to put a flat prior or diffuse 
spherical normal prior on (3. We assumed that (3 ~ A/"(0, 1001). The choice of a 
prior for r has been the subject of debate. Kelsall and Wakeficld|( 1999 1 recommend 
a gamma prior with shape parameter 0.5 and scale parameter 2000. This prior 
is appealing because it corresponds to the prior belief that the fixed effects are 
sufficient to explain the data (since a large value for r implies small variances for 
the random effects) and because it does not produce artifactual spatial structure 
in the posterior. 

In all cases we simulated a sample path of length 2,000,000. We fit the SGLMMs 
using Metropolis-Hastings random walk updates and/or Gibbs updates. For the 
normal data we used Gibbs updates for all parameters, and we used a Gibbs update 
for t for all three types of data. For the binary and count data we updated (3 
using a random walk with proposal /3^ +1 ' ~ Af(f3^\ V), where V is the estimated 
asymptotic covariance matrix from a classical GLM fit. We updated W using a 
univariate random walk with normal proposals. And we updated each of 5 and 5g 
using a multivariate random walk with a spherical normal proposal. 

6.1. Binary Data. We created a binary dataset by first setting r = 1 and simulat- 
ing random effects according to 5$ ~ A/"(0, Qg 1 )- Then we simulated independent 
observations according to Z \ 8s ~ B(p), where B denotes a Bernoulli random vari- 
able and p = exp(a; + y + M<5,g)/(1 + exp(a; + y + M<5,g)) is the vector of true 
probabilities. 

We fitted the simulated dataset using a nonspatial model, i.e., the standard 
logistic model; the centered autologistic model; the traditional SGLMM; the RHZ 
model; and our sparse model with varying degrees of sparsity — 400 eigenvectors 
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Table 2. The results of our simulation study for an SGLMM for 
binary data, i.e., for a Bernoulli first stage. Running times are 
given in minutes. 



Model 


Dim 


fa 


CI(/3i) 




CI(/3 2 ) 


f 


CI(r) 


\\P ~P\\ 


Time 


Nonspatial 




0.822 


(0.445, 1.199) 


0.852 


(0.475, 1.230) 






4.576 




Autologistic 




0.930 


(0.446, 1.397) 


0.915 


(0.448, 1.367) 






3.876 


20 


Traditional 


900 


2.149 


(0.254, 4.107) 


2.218 


(0.391, 4.132) 


2.803 


(0.287, 6.729) 


3.255 


2,455 


RHZ 


898 


1.079 


(0.626, 1.538) 


0.969 


(0.536, 1.410) 


0.949 


(0.481, 1.482) 


3.263 


1,640 


Sparse 


400 


1.013 


(0.569, 1.465) 


0.946 


(0.520, 1.382) 


1.425 


(0.355, 2.477) 


3.174 


285 


Sparse 


200 


1.014 


(0.582, 1.452) 


0.937 


(0.519, 1.359) 


1.052 


(0.365, 1.872) 


3.175 


167 


Sparse 


100 


0.997 


(0.566, 1.426) 


0.923 


(0.510, 1.342) 


0.935 


(0.339, 1.724) 


3.201 


80 


Sparse 


50 


0.982 


(0.562, 1.403) 


0.900 


(0.495, 1.304) 


0.945 


(0.317, 1.751) 


3.295 


49 


Sparse 


25 


0.964 


(0.552, 1.377) 


0.897 


(0.501, 1.301) 


0.867 


(0.225, 1.699) 


3.543 


37 



(the true model), 200 eigenvectors, 100, 50, and 25. The study results are shown 
in Table [2j and Figure [4] illustrates the inference for /3 with box plots. 

We see that the RHZ model and the true model produced approximately the same 
inference for these data. This is not surprising since the RHZ model is also capa- 
ble of fitting residual structure and has random effects that essentially "contain" 
the random effects for our sparse model, i.e., L<5 can accommodate any structure 
exhibited by M^g. The two models produced point estimates that are close to the 
true values, and confidence intervals that cover the true values but do not cover 0. 
The traditional SGLMM, on the other hand, gave a rather biased point estimate of 
(3 along with confidence intervals that are over four times as wide as those provided 
by the other SGLMMs. 

The standard nonspatial logistic model gave a biased estimate of (3, which we 
expect of a model that does not account for dependence. But the nonspatial model's 
confidence intervals did at least cover the true value of (3 but not 0. The autologistic 
estimate was also biased, albeit less so, and it too gave valid confidence intervals, 
but those intervals are slightly wider than those for the other models (except the 
traditional SGLMM). 

The box plots in Figure [5] and the level plots in Figure [6] show the effect of 
increasing sparsity on regression inference and on fit, respectively. As we apply 
our sparse SGLMM with fewer and fewer eigenvectors, the posterior distribution 
of (3 exhibits increasing bias and decreasing variance, i.e., we observe the expected 
bias-variance tradeoff. The inference and fit provided by our model did not suffer 
appreciably until the number of eigenvectors had been reduced to 25, which repre- 
sents a 97% reduction in the number of parameters relative to the traditional and 
RHZ models. 

Figure [7] shows the distributions of the estimated posterior correlations among 
the random effects for the three SGLMMs. We see that the random effects for 
the RHZ model and for our sparse model are at most weakly correlated, while the 
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Figure 4. Box plots illustrating inference for (3 for the simulated 
binary data. 
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Figure 5. Box plots illustrating the effect of increasing sparsity 
on regression inference for the binary data. 



random effects for the traditional model are often strongly dependent. This is why 
spherical normal proposals are sufficient for 5 and d$- 

6.2. Count Data. We created a count dataset by first simulating random effects 
according to 5s ~ A/"(0, (3Qs) _1 ) and then simulating independent observations 
according to Z | S$ ~ "P{X), where V denotes a Poisson random variable and A = 
cxp(cc + y + M.Ss) is the vector of true rates. 

The study results for the count data are shown in Table [3j The highest posterior 
density intervals for all three spatial models cover the true value of j3, while the 
standard Poisson regression model yielded erroneous inference for j3% because the 
model's failure to account for small-scale structure resulted in a poor point estimate. 
The traditional SGLMM produced a good point estimate of (3 for this dataset but 
once again gave much wider intervals than the other models. 

6.3. Normal Data. For the normal data we used the 20 x 20 lattice with the 
coordinates of the vertices again restricted to the unit square, and we used the 
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Figure 6. The true and fitted probabilities for the simulated bi- 
nary data, for various values of q = dim(<5s). 
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• • • Sparse 
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Posterior Correlation 



0.2 0.4 0.6 0.8 1.0 

X 



Figure 7. Posterior correlations among the random effects for the 
three SGLMMs. 
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Table 3. The results of our simulation study for an SGLMM for 
count data, i.e., for a Poisson first stage. Running times are given 
in minutes. 



Model 


Dim 




CI(/?i) 




CI(/? 2 ) 


f 


CI(r) 


l|A-A|| 


Time 


Nonspatial 




1.108 


(0.989, 1.226) 


1.179 


(1.061, 1.297) 






53.898 




Traditional 


900 


1.007 


(0.073, 1.913) 


1.056 


(0.117, 1.986) 


5.141 


(3.469, 6.952) 


26.039 


2,275 


RHZ 


898 


0.934 


(0.816, 1.051) 


1.092 


(0.975, 1.210) 


4.978 


(2.916, 7.324) 


26.172 


1,864 


Sparse 


400 


0.942 


(0.824, 1.060) 


1.101 


(0.986, 1.215) 


4.499 


(2.873, 6.402) 


24.379 


485 






Pi 






02 












Sparse RHZ Trad Nonspat 



Sparse RHZ Trad Nonspat 



Figure 8. Box plots illustrating inference for (3 for the simulated 
count data. 



first 180 eigenvectors of the corresponding Moran operator. We simulated random 
effects according to S$ ~ W(0, Qg 1 ) and then simulated observations according to 
Z | 8 S ~ W(/x = x + y + MS S , cr 2 I), where a 2 = 1. 

The study results for these data are shown in Table |4| The RHZ and sparse 
models gave narrower intervals than the ordinary linear model because the latter 
model overestimated a 2 . For this dataset the traditional SGLMM gave a poor 
estimate of /3 and inflated the variance so much as to cause a Type II error. We 
note that the running times for this study were much longer than they would have 
been for binary and count datasets of the same size because we did a Gibbs update 
of the random effects for the normal data, which requires an expensive Cholesky 
decomposition. 

6.4. A Heuristic for Dimension Reduction. We conducted a more extensive 
simulation study with the aim of providing a heuristic for unsupervised dimension 
reduction, i.e., dimension reduction that does not employ the data to be analyzed. 
For this study we used the 50 x 50 lattice, which is large but not so large as to make 
fitting the true model infeasible. We used the same regression component as before, 
i.e., X = [xy] and (3 = (1, 1)', and simulated binary datasets for four values of the 
smoothing parameter, r: 0.5, 1, 2, and 4. We used the first 1,100 eigenvectors of the 



14 



HUGHES AND HARAN 



Table 4. The results of our simulation study for an SLMM, i.e., 
for a Gaussian first stage. Running times are given in minutes. 



Model 


Dim 


h 


CI(jSl) 


02 


01(02) 


T 


CI(r) 


a 2 


CI(<r 2 ) 


IIm - All 


Time 


Nonspatial 




1.143 


(0.824, 1.462) 


0.925 


(0.606, 1.244) 






1.558 




14.204 




Traditional 


400 


1.583 


(-0.816, 3.984) 


0.288 


(-2.066, 2.656) 


0.825 


(0.417, 1.322) 


0.793 


(0.585, 1.312) 


9.798 


1,248 


RHZ 


398 


1.143 


(0.912, 1.377) 


0.925 


(0.692, 1.158) 


0.847 


(0.427, 1.365) 


0.802 


(0.595, 1.141) 


9.725 


1,660 


Sparse 


180 


1.143 


(0.884, 1.403) 


0.925 


(0.666, 1.185) 


0.961 


(0.525, 1.486) 


1.020 


(0.866, 1.229) 


8.692 


385 



Pi §2 




CM 

I 1 | I 1 

Sparse RHZ Trad Nonspat Sparse RHZ Trad Nonspat 



Figure 9. Box plots illustrating inference for (3 for the simulated 
Gaussian data. 



Moran operator, which correspond to standardized eigenvalues greater than 0.06 
and thus practically exhaust the patterns of positive dependence for this scenario. 
We then fit the true model and four sparser models to the simulated data. The 
number of random effects for the sparser models were 625 (the upper quartile of the 
eigenvalues), 265 (standardized eigenvalues greater than 0.7), 100, and 50. The last 
two of these were chosen with the hope of recommending a small, fixed dimension 
for the random effects. 

The results of the study are shown in Table [5] and Figure [10] Evidently, a very 
dramatic reduction in dimension comes at little cost to regression inference. In 
fact, since P\ are 02 are negatively correlated in this scenario, greater dimension 
reduction resulted in a better point estimate for (3 in many cases. And even using 
a mere 50 eigenvectors did not relocate any confidence interval enough to miss the 
true value of the parameter. Hence, 50-100 eigenvectors should suffice for most 
analyses, which removes evaluation of the above mentioned quadratic form as the 
chief computational burden in fitting the model. Should one wish to be more con- 
servative, we recommend the use of all eigenvectors corresponding to standardized 
eigenvalues greater than 0.7. For a square lattice this means using approximately 
10% of the eigenvectors. 
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Table 5. The effect of various levels of sparsity, for simulated 
binary data with r equal to 0.5, 1, 2, and 4. Running times are 
given in minutes. 



T 


Dim 




CI(/?i) 


h 


CI(/3 2 ) 


T 


CI(r) 


I|P-P|| 


Time 




1100 


0.862 


(0.581, 


1.147) 


1.166 


(0.865, 


1.463) 


0.665 


(0.439, 


0.912) 


6.379 


5,159 




625 


0.869 


(0.592, 


1.147) 


1.114 


(0.828, 


1.402) 


0.617 


(0.372, 


0.915) 


6.521 


1,836 


0.5 


265 


0.857 


(0.588, 


1.127) 


1.085 


(0.805, 


1.367) 


0.512 


(0.311, 


0.720) 


6.877 


597 




100 


0.852 


(0.588, 


1.117) 


1.035 


(0.766, 


1.307) 


0.464 


(0.253, 


0.676) 


7.424 


260 




50 


0.832 


(0.571, 


1.092) 


1.023 


(0.757, 


1.291) 


0.410 


(0.206, 


0.621) 


7.853 


113 




1100 


1.125 


(0.870, 


1.380) 


0.992 


(0.743, 


1.251) 


0.930 


(0.500, 


1.319) 


5.205 


6,067 




625 


1.097 


(0.847, 


1.344) 


0.971 


(0.724, 


1.221) 


1.056 


(0.585, 


1.687) 


5.294 


1,666 


1 


265 


1.086 


(0.840, 


1.337) 


0.962 


(0.715, 


1.212) 


0.793 


(0.404, 


1.198) 


5.418 


579 




100 


1.050 


(0.809, 


1.293) 


0.924 


(0.683, 


1.165) 


1.141 


(0.533, 


1.865) 


5.787 


169 




50 


1.037 


(0.796, 


1.275) 


0.920 


(0.679, 


1.161) 


1.031 


(0.435, 


1.726) 


6.005 


109 




1100 


1.046 


(0.800, 


1.293) 


0.988 


(0.739, 


1.239) 


2.234 


(0.676, 


3.789) 


4.134 


4,469 




625 


1.046 


(0.805, 


1.291) 


0.992 


(0.743, 


1.237) 


1.701 


(0.856, 


2.715) 


4.098 


1,926 


2 


265 


1.034 


(0.793, 


1.276) 


0.990 


(0.744, 


1.238) 


1.507 


(0.548, 


2.723) 


4.116 


552 




100 


1.018 


(0.780, 


1.254) 


0.966 


(0.726, 


1.210) 


2.031 


(0.616, 


4.163) 


4.230 


184 




50 


1.012 


(0.779, 


1.250) 


0.960 


(0.721, 


1.202) 


2.478 


(0.465, 


5.205) 


4.392 


131 




1100 


0.978 


(0.743, 


1.214) 


1.173 


(0.934, 


1.411) 


4.530 


(1.192, 


6.581) 


3.238 


4,698 




625 


0.986 


(0.750, 


1.224) 


1.142 


(0.903, 


1.380) 


3.470 


(1.246, 


7.501) 


3.417 


2,316 


4 


265 


0.972 


(0.733, 


1.205) 


1.133 


(0.898, 


1.374) 


2.839 


(1.103, 


5.186) 


3.506 


546 




100 


0.957 


(0.725, 


1.195) 


1.126 


(0.886, 


1.363) 


2.639 


(0.935, 


4.939) 


3.567 


249 




50 


0.951 


(0.716, 


1.185) 


1.122 


(0.885, 


1.360) 


2.296 


(0.697, 


4.580) 


3.652 


155 



7. Application to US Infant Mortality Data 



The plot in Figure 11 shows infant mortality data for 3,071 US counties. Each 
shaded circle represents a ratio of deaths to births, i.e., an empirical infant mortal- 
ity rate, for a given county. The data were obtained from the 2008 Area Resource 
File (ARF), a county-level database maintained by the Bureau of Health Profes- 
sions, Health Resources and Services Administration, US Department of Health 
and Human Services. Specifically, three variables were extracted from the ARF: 
the three-year (2002-2004) average number of infant deaths before the first birth- 
day, the three-year average number of live births, and the three-year average number 
of low birth weight infants. 
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Figure 10. Box plots illustrating inference for (3 for simulated 
binary data with various values for t and various degrees of spar- 
sity. 



To these data we fit our sparse Poisson SGLMM with 

(4) 

E(DEATHSi I (3 8s) = BIRTHSi e ^O+/8ilOW < +^2BLACK 1 +j93HISP < +^4Gimi+j85AFP < +^ 6 STAB 4 +M < *s 



where deaths is the number of infant deaths; births is the number of live births; 
low is the rate of low birth weight; BLACK is the percentage of black residents 
(according to the 2000 US Census); HISP is the percentage of hispanic residents 
(2000 US Census); GINI is the Gini coefficient, a measure of income inequality 
( Gini| 19211; AFF is a composite score of social affluence (Yang, Teng, and Haran 



2009); and STAB is residential stability, an average z-score of two variables from 



the 2000 US Census. We chose dim(<5s) = 100. Simulating a sample path of 
length 2,000,000 yielded Monte Carlo standard errors smaller than 0.005 for the 
regression coefficients, smaller than 0.1 for r, and on the order of 0.001 for the 
random effects. The analysis took just 3.5 hours. Fitting the RHZ model to these 
data took approximately fourteen days, partly because storing the sample paths for 



the 3,064 random effects required the use of a 46 GB file-backed matrix (Kane and 



Emerson 2010). Had we been able to store all results in RAM, fitting the RHZ 



model would have taken approximately four days. 

The results are given in Table[6j We see that all predictors were found significant. 
The posterior mean for r is much smaller than the prior mean of 1,000, which 
contraindicates a nonspatial analysis of these data. Moreover, five eigenvectors 
were found to be significant predictors. 
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Figure 11. A plot of infant mortality rates for 3,071 counties of 
the contiguous United States. The shaded circle for a given county 
represents that county's ratio of deaths to births, where births and 
deaths were averaged over the three-year period 2002-2004. A 
darker circle indicates a higher rate. 

Table 6. The results of fitting model Q to the infant mortality data. 



Predictor 


Parameter 


Estimate 


CI 


Intercept 


A) 


-5.452 


(-5.638, -5.268) 


LOW BIRTH WEIGHT 


Al 


8.719 


(7.475, 9.971) 


BLACK 


A> 


0.00428 


(0.00295, 0.00559) 


HISPANIC 


As 


-0.00405 


(-0.00515, -0.00291) 


GINI 


Ai 


-0.495 


(-0.930, -0.0610) 


AFFLUENCE 


A> 


-0.0774 


(-0.0896, -0.0654) 


STABILITY 


As 


-0.0292 


(-0.0439, -0.0144) 




T 


12.016 


(6.088, 19.116) 
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8. Discussion 

In this paper we developed a framework for reparameterization of the areal 
SGLMM. By accounting appropriately for the covariates of interest, our approach 
alleviates the spatial confounding that plagues the traditional SGLMM. And, by 
accounting for the underlying graph, our model (1) permits patterns of positive 
spatial dependence only and (2) allows for dramatic reduction in the dimension of 
the random effects. 

We studied the performance of our approach for three of the most commonly 
used first-stage models: Bernoulli, Poisson, and Gaussian. In all three cases our 
approach resulted in better regression inference and in considerable gains in the 
computational efficiency of the MCMC algorithms used for inference. A thorough 
followup simulation study indicates that a small, fixed number of random effects is 
sufficient for all datasets. More conservatively, our sparse model appears to require 
no more than O.ln random effects to perform well with respect to both regression 
and prediction. The resulting gain in computational efficiency will permit the 
relatively rapid analysis of datasets that were once considered too large for the 
areal SGLMM. 
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